87
7.1
The figure shows a selection. Each two bits corresponding to Shannon coding or
Shannon entropy are represented by a nucleotide. If you look at proteins, there are 20
amino acids encoded with 64 codons, i.e. 6 bits (because 2 to the power of 6 or 2**6 is 64).
The three-dimensional protein structure code is much more complex. There are so
many possibilities here that the information value of a defined protein structure is very
high (to be calculated in a simplified way by the number of bits that a PDB structure file
has when it is downloaded, which is already hundreds of thousands of bits). Informatically
clever is the use of internal coordinates to encode protein structures with few bits: Only the
path from one amino acid to the next is ever specified. This can be done with the angles
phi and psi at the central carbon atom (alpha-C atom) of each amino acid (AlQuraishi
2019). If I then use four or eight standard conformations to merely represent the protein
structure in a highly simplified way, I only need 2 or 3 bits for each amino acid position in
a protein folding simulation (Saxena et al. 1997).
Finally, there are other codes, for example at the cell membrane (membrane lipids, but
also specific membrane modifications), the RNA sequence structure code within the cell
for regulatory RNA, metabolic regulation (e.g. iron) as well as localisation in the cell, and
finally the sugar code at the cell surface, with which cells recognise each other and via
which transplant rejection is also coded. Finally, there are phospholipids that, for example
via gangliosides and cerebrosides (i.e. sugar-lipid structures), assign the wiring in the
brain and different neuronal structures to each other in detail in order to ensure the plastic
ity of our brain during embryology.
All these codes are not only used and needed in the cell, but you can also decode them
with bioinformatics, especially via sequence.
In this way, it is possible to translate the fairly universal genetic code (program
“Translate” from the Expert Protein Analysis System, EXPASY, at the “Swiss Institute of
Bioinformatics” https://web.expasy.org/translate/) and better understand its rarer variants
for certain codons, for example in mitochondria, some bacteria and also protozoa (Heaphy
et al. 2016) (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi). Similarly, sig
nals in regulatory RNA can be analyzed, for example with the RNA analyzer (https://rna
analyzer.bioapps.biozentrum.uni-wuerzburg.de/), but also, for example, sugar codes
(https://www.functionalglycomics.org/; https://ncfg.hms.harvard.edu/) or code analyses in
lipids, for example to assign lipids to the correct type after mass spectrometry (Ahmed
et al. 2015).
7.3
Understanding Coding Better
So what can we take away as insights? It’s a lot like a conversation in a busy pub. The
signals of the cell are constantly fighting against the background noise. Apart from our
own signalling cascade, which we are currently interested in, such as the Erk kinase
7.3 Understanding Coding Better